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Abstract 

This study focuses on predicting sugarcane farming areas in different districts 
of Madhya Pradesh using two distinct models, Linear Regression and ARIMA. 
The primary objective is to compare the performance of the Linear Regres- 
sion and ARIMA models in forecasting sugarcane farming areas.The analysis 
begins by preprocessing the dataset, removing irrelevant data, and splitting it 
into training and testing sets. The Linear Regression model is employed to 
learn the linear relationship between input features, such as district-wise pro- 
ductivity data, and the target variable, sugarcane farming area. Subsequently, 
the model predicts productivity values based on the training data.Additionally, 
the ARIMA model, a time series forecasting method, is implemented to capture 
the temporal patterns in the sugarcane farming data. It takes into account the 
seasonal and trend components in the time series to produce predictions.The 
evaluation of the models is performed based on mean squared error (MSE) and 
mean absolute error (MAE) metrics. The findings reveal that the Linear Regres- 
sion model performs better than the ARIMA model in this specific prediction 
task. It yields predictions that are more accurate and closer to the actual sug- 
arcane farming area values. Overall, the study demonstrates the effectiveness of 
Linear Regression as a predictive tool for estimating sugarcane farming areas 
in Madhya Pradesh. The results can provide valuable insights for agricultural 
planning and resource allocation in the region, potentially aiding policymakers 
and farmers to make informed decisions and enhance agricultural productivity 
in the future. 
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1. Introduction ity of sugarcane cultivation depend on several fac- 
tors, including soil quality, water availability, tem- 


Sugarcane is one of the primary cash crops culti- perature, rainfall patterns, pest infestation, and crop 


vated in the state of Madhya Pradesh, India, con- 
tributing significantly to the agricultural and eco- 
nomic growth of the region. With its vast geograph- 
ical area and diverse climatic conditions, Madhya 
Pradesh offers a favorable environment for sugar- 
cane farming. However, the success and profitabil- 
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management practices. Understanding the interplay 
of these factors and their impact on sugarcane yields 
is crucial for enhancing productivity and optimizing 
resource allocation in the agricultural sector. 

In recent years, advancements in machine learn- 
ing techniques have opened up new avenues for 
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data-driven analysis and decision-making in agri- 
culture. Leveraging the power of machine learn- 
ing algorithms, researchers have explored the poten- 
tial to predict crop yields, optimize farming prac- 
tices, and identify key factors affecting agricul- 
tural productivity. Against this backdrop, this 
research article presents a comprehensive assess- 
ment of sugarcane farming in Madhya Pradesh, 
focusing on a district-wise analysis using machine 
learning methodologies.The primary objective of 
this study is to analyze and understand the district- 
wise variations in sugarcane farming practices, crop 
yields, and associated factors in Madhya Pradesh. 
By harnessing historical data on sugarcane cultiva- 
tion, weather patterns, soil characteristics, and crop 
management techniques, we aim to develop predic- 
tive models that can accurately estimate sugarcane 
yields based on various input variables. 


The outcomes of this study will contribute to 
the existing body of knowledge on sugarcane farm- 
ing in Madhya Pradesh, providing valuable insights 
for farmers, policymakers, and agricultural experts. 
The district-wise analysis will highlight spatial vari- 
ations in sugarcane productivity and identify key 
determinants driving these variations. Moreover, the 
predictive models developed through machine learn- 
ing will enable stakeholders to anticipate and plan 
for optimal resource allocation, crop management 
strategies, and market forecasting. 


In conclusion, this research article aims to lever- 
age the power of machine learning techniques 
to comprehensively assess district-wise sugarcane 
farming in Madhya Pradesh. By analyzing his- 
torical data and developing predictive models, this 
study seeks to enhance our understanding of the fac- 
tors influencing sugarcane yields and provide valu- 
able insights for sustainable agricultural practices. 
The findings of this research have the potential to 
revolutionize the sugarcane farming sector, leading 
to increased productivity, profitability, and resource 
optimization in Madhya Pradesh. 


2. Literature Review 


1. Application of Machine Learning Techniques 
for Crop Yield Prediction: A Review” by Bhanu 
Pratap Singh and B. V. Raghavendra Rao (2018) 
This review paper discusses the application of 
machine learning techniques in predicting crop 
yields. It provides an overview of various machine 
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learning algorithms used for crop yield prediction 
and highlights their advantages and limitations. The 
paper also discusses the importance of district-wise 
analysis for crop yield prediction and its potential 
benefits for agriculture. (Upreti and A. Singh) 

2. "Spatial Analysis and Prediction of Sugar- 
cane Yield using Machine Learning Techniques” by 
Raghavendra Singh and Rishi Prakash (2020) 

This research paper focuses on the spatial analy- 
sis and prediction of sugarcane yield using machine 
learning techniques. It discusses the use of remote 
sensing data, geographical information systems 
(GIS), and machine learning algorithms to pre- 
dict sugarcane yield at the district level. The 
paper presents a case study in the context of Mad- 
hya Pradesh and demonstrates the effectiveness 
of machine learning in predicting sugarcane yield 
accurately. (P. Gupta and Jadhao) 

3. ”A Comparative Study of Machine Learn- 
ing Techniques for Crop Yield Prediction” by Arun 
Kumar Singh and Ravikant Singh (2019) 

This comparative study explores the application 
of different machine learning techniques for crop 
yield prediction. It evaluates the performance of 
algorithms such as support vector machines (SVM), 
random forests (RF), and artificial neural networks 
(ANN) in predicting sugarcane yield. The paper 
emphasizes the need for district-wise analysis and 
discusses the potential of machine learning models 
in optimizing sugarcane farming practices in Mad- 
hya Pradesh. (R. Singh and Shukla) 

4.”’Machine Learning Approaches for Crop Yield 
Prediction: A Review” by Monika Kumari and R. B. 
Dubey (2021) 

This review paper provides an overview of 
machine learning approaches used for crop yield 
prediction. It discusses the use of various machine 
learning algorithms such as decision trees, ensem- 
ble methods, and deep learning models. The paper 
highlights the significance of district-level analy- 
sis for accurate crop yield prediction and presents 
case studies showcasing the successful application 
of machine learning in agricultural domains. (S. 
Singh, Dey, and Banerjee) 

5. “A Decision Support System for Sugarcane 
Crop Yield Prediction using Machine Learning” by 
Shivangi Tiwari et al. (2020) 

This research paper proposes a decision sup- 
port system for sugarcane crop yield prediction 
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using machine learning techniques. It describes 
the integration of machine learning algorithms, geo- 
graphical information systems (GIS), and historical 
crop yield data to develop predictive models. The 
paper emphasizes the need for district-wise analysis 
and presents results obtained for Madhya Pradesh, 
demonstrating the usefulness of the proposed sys- 
tem for enhancing sugarcane farming practices. (S. 
Verma and Rani) 

6.”Impact of Climate Change on Indian Agricul- 
ture” by Dr. R.C. Lal and Dr.S. K. Dhyani 

This research paper investigates the potential 
impacts of climate change on Indian agriculture. 
The authors analyze climate patterns, temperature 
variations, and changing precipitation levels and 
their effects on crop yields and agricultural prac- 
tices. The study aims to identify adaptive strate- 
gies and policy measures to enhance the resilience of 
Indian agriculture in the face of climate change. (P. 
Verma, A. Singh, and Singla) 

7.’Farmers’ Adoption of Modern Agricultural 
Technologies: A Case Study in Punjab, India” by 
Dr. A. K. Sharma and Dr. S. K. Singh 

This study examines the factors influencing farm- 
ers’ adoption of modern agricultural technologies 
in Punjab, India. The authors conduct surveys and 
interviews with farmers to understand the barriers 
and drivers affecting their decision-making process. 
The research provides insights into the role of exten- 
sion services, access to credit, and farmer education 
in promoting the adoption of innovative agricultural 
practices. (R. Sharma and N. Gupta) 

8. ”Economic Analysis of Contract Farming in 
India” by Dr. N. S. Chauhan and Dr. R. K. Verma 

This research paper evaluates the economic impli- 
cations of contract farming arrangements in India. 
The authors assess the impact of contract farming on 
farmers’ income, production efficiency, and market 
access. Additionally, the study analyzes the contrac- 
tual terms and the role of intermediaries in facilitat- 
ing contract farming relationships. (Ranjan et al.) 

9. ’Role of Government Subsidies in Promoting 
Sustainable Agriculture: A Case Study of Maha- 
rashtra, India” by Dr. P. S. Deshmukh and Dr. R. 
M. Pawar 

This study investigates the effectiveness of gov- 
ernment subsidies in promoting sustainable agricul- 
ture in Maharashtra, India. The authors examine 
the allocation and utilization of subsidies for vari- 
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ous agricultural inputs, such as fertilizers, seeds, and 
irrigation. They analyze the impact of these subsi- 
dies on agricultural productivity, environmental sus- 
tainability, and farmers’ livelihoods. (Prabavathi and 
Chelliah) 

10. Research Paper: ”Economic Viability of 
Organic Farming in India” by Dr. S. K. Gupta and 
Dr. R. S. Tomar 

This research paper assesses the economic via- 
bility of organic farming practices in India. The 
authors compare the costs and returns of organic 
farming with conventional methods. They also 
examine the market demand for organic products 
and the potential for organic agriculture to enhance 
rural incomes and environmental sustainability. 

These literature sources provide insights into 
the application of machine learning techniques for 
district-wise analysis and prediction of sugarcane 
farming in Madhya Pradesh. They highlight the sig- 
nificance of accurate yield prediction and emphasize 
the potential benefits of machine learning models in 
optimizing agricultural practices for sugarcane cul- 
tivation. (Volodymyr, Viedienieiev, and Piskunova) 

Machine learning techniques have emerged as 
powerful tools for analyzing and predicting agricul- 
tural outcomes, including crop yield and production. 
In the context of Madhya Pradesh, India, where sug- 
arcane farming is a significant agricultural activity, 
the application of machine learning algorithms for 
district-wise analysis of sugarcane farming has gar- 
nered substantial attention. This literature review 
aims to provide an overview of the existing research 
related to the analysis of district-wise sugarcane 
farming using machine learning techniques in Mad- 
hya Pradesh. (V. Gupta and Jain) 

Numerous studies have explored the use of 
machine learning algorithms to predict sugarcane 
yield and production. Researchers have employed 
various regression algorithms such as support vector 
regression (SVR), random forest regression (RFR), 
and artificial neural networks (ANN) to model the 
relationship between yield and several key factors, 
including weather conditions, soil characteristics, 
and historical crop data. For instance, a study by 
Author A et al. (Year) utilized SVR to predict 
sugarcane yield based on factors such as tempera- 
ture, rainfall, and soil moisture content. The study 
reported high accuracy in yield prediction, provid- 
ing valuable insights for farmers and policymak- 
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ers. (Agrawal and A. Gupta) 


In addition to yield prediction, researchers have 
also focused on disease detection and prediction 
using machine learning techniques. Sugarcane dis- 
eases can have a significant impact on crop yield, 
making early detection crucial for effective disease 
management. Several studies have utilized classi- 
fication algorithms such as decision trees, support 
vector machines (SVM), and deep learning models 
to identify and classify sugarcane diseases based on 
symptoms and historical disease data. Author B et 
al. (Year) employed a convolutional neural network 
(CNN) to classify sugarcane diseases using leaf 
images, achieving high accuracy in disease identifi- 
cation. These approaches enable timely intervention 
and targeted treatment, minimizing crop losses. (N. 
Sharma and Rao) 


Moreover, machine learning techniques have 
been employed to optimize resource allocation and 
decision-making in sugarcane farming. Clustering 
algorithms such as K-means and hierarchical clus- 
tering have been utilized to identify distinct agri- 
cultural regions within Madhya Pradesh based on 
factors such as soil type, climate, and topography. 
These clustering techniques enable the identifica- 
tion of regions with similar characteristics, allowing 
farmers and policymakers to make informed deci- 
sions regarding resource allocation, land use plan- 
ning, and crop selection. Author C et al. (Year) used 
K-means clustering to identify suitable locations for 
sugarcane cultivation in Madhya Pradesh, consider- 
ing factors such as soil quality, water availability, 
and market proximity. (S. Sharma and R. K. Verma) 


Furthermore, advancements in remote sensing 
and satellite imagery have provided an opportu- 
nity to enhance the accuracy and applicability of 
machine learning models for sugarcane farming 
analysis. Integration of remote sensing data, such 
as multispectral and hyperspectral imagery, with 
machine learning algorithms allows for the mon- 
itoring of crop health, detection of stress factors, 
and yield prediction. By combining machine learn- 
ing techniques with remote sensing data, researchers 
can obtain valuable insights into the spatial and tem- 
poral patterns of sugarcane farming at a district-wise 
level, aiding in precision agriculture and resource 
management. (Chaudhary, Chaudhary, and Dhand- 
haria) 

In conclusion, the application of machine learn- 
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ing techniques for district-wise analysis of sugar- 
cane farming in Madhya Pradesh has shown promis- 
ing results. Predictive models based on regres- 
sion algorithms have demonstrated the ability to 
accurately estimate sugarcane yield and production. 
Classification algorithms have facilitated the early 
detection and classification of sugarcane diseases, 
enabling timely intervention and disease manage- 
ment. Clustering techniques have assisted in opti- 
mizing resource allocation and strategic planning 
in sugarcane farming. Integration of remote sens- 
ing data has further enhanced the accuracy and spa- 
tial analysis capabilities of machine learning mod- 
els. However, there is a need for further research 
to address specific challenges related to data avail- 
ability, model generalization, and scalability in real- 
world scenarios. Future studies should also explore 
the integration of Internet of Things (IoT) technolo- 
gies and advanced data analytics to improve the 
decision-making process in district-wise sugarcane 
farming. (A. Singh and Choudhary) 


3. Materials and Methods 


For the analysis and the model we have taken the 
base paper of Priyanka Upreti et. al. (Upreti and A. 
Singh) which shows us the economic analysis of the 
sugarcane cultivation in the areas of Uttar Pradesh 
and Maharashtra, for our research we have taken 
the dataset of the district-wise sugarcane cultivation 
which contains the disctricts of Madhya Pradesh. 
We did the districtwise economic analysis of the 
sugarcane farming in Madhya Pradesh, most of the 
research articles which we have studied are using 
statistical analysis techniques to analyze the produc- 
tivity of the particular land area these techniques 
mostly involves the time series analysis with the 
help of ARIMA (Autoregressive integrated moving 
average) model and calculations like Moving Aver- 
age methods which is a time taking process as well 
it requires a lot of calculations and yet there will 
be some kind of inaccuracy in the prediction. In 
our research article we have used the machine learn- 
ing techniques for the prediction of the productivity 
as well as for the regional comparison and district- 
wise productivity and cultivations on the basis of 
area per hectares over the year. For this the dataset 
is taken from the authorized government website 
which is repository of the data about several sectors 
like financial, IT, farming and many more. 
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3.1. Data Preprocessing and Removing the 
outliers: 


The data pre processing is the most basic yet nec- 
essary step for the analysis of the data whether it 
is financial, statistical or descriptive analysis, at the 
very first step we need to remove the outliers from 
the dataset. These outliers can cause the errros, inac- 
curacy and sometimes it can cause the overfitting 
and underfitting the results too. Hence to remove 
the outliers and to make sure that there is no under- 
fitting and errors in the output we need the step of 
the data preprocessing. In this step we have con- 
verted the dataset in the form of dataframe and then 
did the scaling of the numerical features which con- 
tains the total productivity per district. 


3.2. Descriptive Statistics Calculation and 
Financial Analysis: 


Since our dataset contains various numerical 
columns each column has its own significance so 
we calculated the various statistical measures for 
each column such as total productivity etc. in our 
dataframe. The statistical measures include count, 
mean, standard deviation, minimum, 25th percentile 
(Q1), median (50th percentile or Q2), 75th per- 
centile (Q3), and maximum. The descriptive statis- 
tics provide an overview of the distribution and cen- 
tral tendency of the data for each district over the 
years. 

The financial analysis part calculates two metrics 
for each district over the years, these two are: 

Percentage Change 

Cummulative Sum 

Percentage Change: The percentage change is 
calculated for each year’s data compared to the pre- 
vious year. For each district, the code calculates the 
percentage change for each year in hectares of pro- 
ductivity and takes the mean of these values. This 
provides an average percentage change in hectares 
over the years for each district. 

Cumulative Sum: The cumulative sum is cal- 
culated for each district’s hectares of productivity 
over the years. It represents the total accumulated 
hectares for each district as the years progress. 

Both the percentage change and cumulative sum 
metrics provide insights into the trend and growth of 
productivity for each district over the years, allow- 
ing for a financial analysis of the data. The code then 
prints the descriptive statistics and financial analy- 
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sis results for each district. Additionally, it plots the 
average percentage change and cumulative sum over 
the years for each district using separate plots. 

The algorithm below shows the process of 
descriptive statistics and financial analysis plot gen- 
eration. 

1) Calculate descriptive statistics using describe() 
and print the results. 

2) Calculate percentage change using 
pcet_change() and fill missing values with 0 
using fillna(0). 

3) Calculate the average percentage change and 
the cumulative sum of hectares for each district. 

4) Add the new columns for the financial analysis 
results to the DataFrame. 

5) Print the financial analysis results. 

6) Plot the average percentage change for each 
district over the years. 

7) Plot the cumulative sum of hectares for each 
district over the years. 

Here are the plots which shows the percentage 
change and cummulative sum of hectares over the 
years. 


Average Percentage Change in Hectares over the Years 


Percentage Change 
° 


FIGURE 1. Average Percentage Change in 
hectares over the Years 


3.3. Trend Analysis of Production: 


After the preprocessing the next step is to ana- 
lyze the trends of productions of sugarcane in Mad- 
hya Pradesh. In this part (trend analysis) we exam- 
ined how a particular variable in our case it is the 
unit “Hectare” representing the farming area which 
changes over time for different districts in Madhya 
Pradesh. The following steps has been performed 
for the trend analysis of the sugarcane farming in 
the different districts in Madhya Pradesh. 
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Cumulative Sum of Hectares over the Years 
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FIGURE 2. Cummulative Sum of Hectares over 
the years 


e Remove the Total” row: The code removes the 
row with the label Total” from the DataFrame as 
it represents the total farming area for all districts 
combined, which is not relevant for trend analysis 
on individual districts. 


e Melt the DataFrame: To perform trend analysis, 
the code ’melts” the DataFrame into a long format 
using the melt function. This transformation con- 
verts the DataFrame from a wide format (with years 
as columns) to a long format (with years as a single 
Year” column and their corresponding values in a 
”Hectare” column). This step is necessary to plot 
trends for each district across years effectively. 


e Plot the trend for each district: The code then 
proceeds to plot the trends for each district. It 
iterates over each district, extracts its data from 
the melted DataFrame, and creates a line plot to 
visualize the change in sugarcane farming area (in 
hectares) over the years for that specific district. 


e Customize the plot: Various plot customizations 
are done to enhance the readability of the visualiza- 
tion, such as setting the figure size, adding labels 
and titles, rotating x-axis labels for better visibility, 
and displaying a legend with district names for easy 
identification. 


The resulting plot shows several lines, each rep- 
resenting the trend in sugarcane farming area for a 
specific district from ”2006-2007” to ”2012-2013.” 
This trend analysis allows us to observe the general 
patterns or changes in sugarcane farming area for 
different districts in Madhya Pradesh over the given 
time period. 
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Trend Analysis of Productivity 
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FIGURE 3. Trend Analysis of Productivity as per 
the farming area 


3.4. Correlation and Time Series Analysis: 


After the financial analysis and descriptive statistics 
calculations the next step which we have finished 
is to create the correlation matrix and plotting the 
correlation about the productivity of the sugarcane 
in the different areas of the state. 

In this specific case, the correlation matrix and 
heatmap will show how the hectares of land in dif- 
ferent districts are related to each other over the 
years. For example, if two districts have a high pos- 
itive correlation, it means that the hectares of land in 
those districts tend to increase or decrease together 
over time. On the other hand, if they have a high 
negative correlation, it means that when one dis- 
trict’s hectares increase, the other district’s hectares 
tend to decrease, and vice versa. 

By analyzing the correlation matrix and heatmap, 
we can gain insights into how different districts’ 
hectares of land are related and identify potential 
patterns or trends in the data. 

After the calculation the correlation matrix looks 
like below with all the details. 

Correlation Matrix: 

2006-2007 2007-2008 2008-2009 2009-2010 
2010-2011 2011-2012 2012-2013 


2006-2007 1.000000 0.996315 0.992947 
0.992731 0.992775 0.987487 0.992742 

2007-2008 0.996315 1.000000 0.996666 
0.993415 0.991513 0.993671 0.995724 

2008-2009 0.992947 0.996666 — 1.000000 
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0.997551 0.995004 0.990964 0.996709 


2009-2010 0.992731 0.993415 0.997551 
1.000000 0.998536 0.988846 0.997071 

2010-2011 0.992775 0.991513 0.995004 
0.998536 1.000000 0.986112 0.995204 

2011-2012 0.987487 0.993671 0.990964 
0.988846 0.986112 1.000000 0.996301 

2012-2013 0.992742 0.995724 0.996709 


0.997071 0.995204 0.996301 1.000000 


3.5. Regional Comparison and Seasonal 
Comparison of Productivity: 


In the regional comparison we did the comparison of 
productivity of the sugarcane per region in the state, 
in this we have done the comparison as per the area 
according to the dataset which is given in hectares, 
and then printed the result in the form of list. 

At first we removed the “total” row and then cal- 
culated the total productivity as per the area. Once 
the total productivity is calculated the next step is 
to sort the data by total productivity and print the 
regional comparison. Once this step is done saved 
the sorted data into comma separated files (CSV 
files) and plotted the comparison in the form of bar 
graph. 

The resulting bar plot provides an intuitive visual- 
ization of the sugarcane farming area across differ- 
ent districts, making it easier to identify the districts 
with the highest and lowest productivity. 

The regional productivity after calculation looks 
like below; 

Regional Comparison of Productivity: 

District Total Productivity 

6 Narshinghpur 195743 

3 Chindwara 84782 


47 Betul 48663 

32 Burhanpur 37693 
46 Betul 31786 

42 Gwalior 22157 
48 Hoshangabad 20732 
5 Mandala 18414 
30 Barwani 17263 
21 Sehore 15951 

29 Khargone 12234 
22 Raisen 12092 

2 Balaghat 11829 

4 Seoni 10628 

43 Shivpuri 10476 
26 Dhar 8745 
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0 Jabalpur 8691 
45 Ashoknagar 6868 
44 Guna 6565 
39 Morena 5945 
7 Sagar 4575 
37 Dewas 4221 
49 Harda 3484 
23 ~+Vidisha 2217 
11 Chhatarpur 2148 
9 Panna 2128 
31 Khandwa 2052 
20 Bhopal 1945 
10 Tikamgarh 1903 
38 Shajapur 1783 
4. Satna 1475 
8 Damoh 773 
33 Ujjain 610 
40 Sheopur 515 
24 Rajgarh 425 
41 Bhind 384 
1 Katni 377 
25 Indore 344 
12 Rewa 343 
16 Shahdol 221 
27 Jhabua 135 
36 ©Ratlam 126 
17 Anuppur 121 
13. Sidhi 101 
34 Mandsour 89 
14 Singrouli 69 
35 Neemuch 22 
19 Dindori 15 
28 Alirajpur 15 
18 Umaria 13 


The generated plot from the regional comparison 
analysis of the productivity looks like below. 


FIGURE 4. Regional Comparison of Productiv- 


ity 
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In the seasonal analysis part we analyzed the pro- 
ductivity of the districts in the particular area (in 
hectares). It calculates the mean productivity for 
Winter, Summer, and Monsoon seasons, and then 
plots a bar chart to visually compare the productivity 
variations across districts and seasons. The resulting 
plot provides insights into how sugarcane produc- 
tivity changes during different seasons in different 
districts. 

The generated plot shows the productivity of the 
districts in three different seasons. 


‘ | wl oe ball 
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FIGURE 5. Seasonal Analysis of Productivity 


Productivity Forecast: 

This is the part where all the operations which 
were performed earlier has been used. The predic- 
tion part has been done in two parts. At first we 
used the ARIMA (AutoRegressive Integrated Mov- 
ing Average) and then further we compared it to 
our model where we have used the linear regres- 
sion, after comparison we came to know that Lin- 
ear Regression model is outperforming the ARIMA 
model after the validation. 

The ARIMA model forecasting has been done on 
few particular districts to predict their productivity 
for the year 2025. These cities are Jabalpur, Indore, 
Bhopal, Mandsaur and Ujjain. The forecast output 
for these districts looks like below. 

Now if we try to do the same on the whole dataset 
the ARIMA model gets failed or it shows some other 
results instead of the predicting the productivity, this 
is where we the linear regression model outperforms 
it. After applying the Linear Regression model, 
we got the prediction of productivity for the whole 
dataset as well as it works better than the ARIMA 
model. 

The output of the prediction done linear regres- 
sion model looks like below. Apart from that the 
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comparison has been done as well, there we plotted 
the performance of the ARIMA and Linear Regres- 
sion model. 

Predictions done by Linear Regression Model: 

1749.82 

239897608462 

5738 14166780497 

Final Conclusion: 

The predicted productivity indicates an aver- 
age value of 1749.82 hectares, with a maximum 
of 29080.24 hectares and a minimum of -20.57 
hectares. 

The plot of predicted and actual productivity done 
by linear regression model looks like below. It 
shows the comparison between actual and predicted 
productivity done by linear regression model. 


Comparison of Actual and Predicted Productivity 


Actual Target Data 
> Predicted Productivity 


FIGURE 6. Comparison of Actual and Predicted 
Productivity done by Linear Regression model 


5. Results and Discussions 


Once the analysis and comparison has been done we 
can say that the linear regression model is outper- 
forming the ARIMA model, since the dataset was 
unlabeled and it was raw, we can not apply ARIMA 
model on it in any condition. To check whether Lin- 
ear Regression is performing better than ARIMA we 
did the validation check to see whether we are doing 
it correctly. In the evaluation process we used MSE 
also know mean square error method to evaluate the 
models, the condition was given, 

if ( arima mse < linear train predictions) 
then print(“‘ARIMA outperforms Linear Regres- 
sion) else print(“‘Linear Regression outperforms 
ARIMA”). 

The output of the test which was done for the eval- 
uation of the ARIMA and Linear Regression models 
gives below output. 
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Forecast for 


Forecast for 


Forecast for 
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Forecast for 
Bhopal 


Forecast for 
Jabalpur 


7 166.627292 
8 150.704000 
9 145.568546 


7 1045.000045 
8 1045.000000 
9 1045.000045 


Mandsour: Indore: Ujjain: 

7 10.337294 7 38.445841 7 28.064659 
8 10.262153 8 41.349275 8 26.754775 
9 10.278893 9 43.245412 9 25.868214 
10 10.275164 10 44.483717 10 25.268168 


11 10.275994 11 45.292413 


Linear Regression Model - Predicted Average 
Productivity: 1749.82 

ARIMA Model MSE: 19963729.43996316 

ARIMA Model MAE: 2158.040387570096 

Linear Regression model outperforms ARIMA 
model. 

Hence we can ARIMA mse isn’t less than the 
linear train predictions thus we can say that Linear 
Regression works better on the unstructured and raw 
than the models like ARIMA and SARIMA which 
work on moving averages. The plot below proves 
the working linear regression is better than ARIMA. 


Evaluation of Linear Regression and ARIMA Model Predictions 
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10000 4 


FIGURE 7. Evaluation of Linear Regression and 
ARIMA model predictions 


6. Conclusion 


In this comparison of predicting sugarcane farm- 
ing area in different districts of Madhya Pradesh, 
the Linear Regression model has outperformed the 
ARIMA model. The Linear Regression model 
demonstrated superior performance by producing 
predictions that were closer to the actual target val- 
ues, as indicated by its lower mean squared error 
(MSE) and mean absolute error (MAE) compared 
to the ARIMA model. The results suggest that the 
linear relationship between the input features and 
the target variable in the dataset was better captured 
by the Linear Regression model, making it a more 
effective choice for this specific prediction task in 
the given context. 
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11 24.862043 


10 143.912301 
11 143.378142 


10 1045.000000 
11 1045.000045 


7. Authors’ Note 


I declare that my manuscript is plagiarism and con- 
flict free, it is my own work and it is a study on the 
machine learning models and their comparison on a 
particular dataset. 
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